Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.
translated by 谷歌翻译
基于深度学习的模型占主导地位的生产推荐系统的当前景观。此外,近年来目睹了模型规模的指数增长 - 从谷歌的2016年模型,最新的Facebook的型号有10亿个参数,具有12万亿参数。型号容量的每次跳跃都有显着的质量增强,这使我们相信100万亿参数的时代即将来临。然而,即使在工业规模数据中心内,这些模型的培训也在挑战。这种困难是从训练计算的惊人的异质性继承 - 模型的嵌入层可以包括总模型尺寸的99.99%,这是极其内存密集的;虽然其余的神经网络越来越多地计算密集型。为支持培训此类巨大模式,迫切需要有效的分布式培训系统。在本文中,我们通过仔细共同设计优化算法和分布式系统架构来解决这一挑战。具体而言,为了确保培训效率和训练精度,我们设计一种新型混合训练算法,其中嵌入层和密集的神经网络由不同的同步机制处理;然后,我们构建一个名为Persia的系统(短暂的并行推荐培训系统,其中包含混合加速),以支持这种混合培训算法。理论上的示范和实证研究均达到100万亿参数,以证明了波斯的系统设计和实施。我们将Pensia公开使用(在https://github.com/persiamml/persia),以便任何人都能够以100万亿参数的规模轻松培训推荐模型。
translated by 谷歌翻译
With the development of a series of Galaxy sky surveys in recent years, the observations increased rapidly, which makes the research of machine learning methods for galaxy image recognition a hot topic. Available automatic galaxy image recognition researches are plagued by the large differences in similarity between categories, the imbalance of data between different classes, and the discrepancy between the discrete representation of Galaxy classes and the essentially gradual changes from one morphological class to the adjacent class (DDRGC). These limitations have motivated several astronomers and machine learning experts to design projects with improved galaxy image recognition capabilities. Therefore, this paper proposes a novel learning method, ``Hierarchical Imbalanced data learning with Weighted sampling and Label smoothing" (HIWL). The HIWL consists of three key techniques respectively dealing with the above-mentioned three problems: (1) Designed a hierarchical galaxy classification model based on an efficient backbone network; (2) Utilized a weighted sampling scheme to deal with the imbalance problem; (3) Adopted a label smoothing technique to alleviate the DDRGC problem. We applied this method to galaxy photometric images from the Galaxy Zoo-The Galaxy Challenge, exploring the recognition of completely round smooth, in between smooth, cigar-shaped, edge-on and spiral. The overall classification accuracy is 96.32\%, and some superiorities of the HIWL are shown based on recall, precision, and F1-Score in comparing with some related works. In addition, we also explored the visualization of the galaxy image features and model attention to understand the foundations of the proposed scheme.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
We discuss a platform that has both software and hardware components, and whose purpose is to support research into characterizing and mitigating the sim-to-real gap in robotics and vehicle autonomy engineering. The software is operating-system independent and has three main components: a simulation engine called Chrono, which supports high-fidelity vehicle and sensor simulation; an autonomy stack for algorithm design and testing; and a development environment that supports visualization and hardware-in-the-loop experimentation. The accompanying hardware platform is a 1/6th scale vehicle augmented with reconfigurable mountings for computing, sensing, and tracking. Since this vehicle platform has a digital twin within the simulation environment, one can test the same autonomy perception, state estimation, or controls algorithms, as well as the processors they run on, in both simulation and reality. A demonstration is provided to show the utilization of this platform for autonomy research. Future work will concentrate on augmenting ART/ATK with support for a full-sized Chevy Bolt EUV, which will be made available to this group in the immediate future.
translated by 谷歌翻译
最大化单调性函数是机器学习,经济学和统计数据中的一项基本任务。在本文中,我们提出了单调连续DR-submodular最大化问题的两种通信效率分散的在线算法,这两者都减少了函数梯度评估的数量,并从$ t^{3/2}中降低了每轮的通信复杂性$至$ 1 $。第一个,单发的分散式元弗兰克 - 沃尔夫(Mono-dmfw),达到了$(1-1/e)$ - 遗憾的是$ o(t^{4/5})$。据我们所知,这是单调连续DR-submodular Maximization的第一个单发和无投射分散的在线算法。接下来,受到非界化的增强功能\ citep {zhang2022boosting}的启发,我们提出了分散的在线增强梯度上升(dobga)算法,该算法获得了$(1-1/e)$ - 遗憾的是$(\ sqrt {\ sqrt { t})$。据我们所知,这是获得$(1-1/e)$的最佳$ o(\ sqrt {t})$的第一个结果步。最后,各种实验结果证实了所提出的方法的有效性。
translated by 谷歌翻译
我们描述了一个软件框架和用于串联的硬件平台,用于设计和分析模拟和现实中机器人自主算法。该软件是开源的,独立的容器和操作系统(OS)的软件,具有三个主要组件:COS ++车辆仿真框架(Chrono)的ROS 2接口(Chrono),该框架提供了高保真的轮毂/跟踪的车辆和传感器仿真;基于ROS 2的基本基于算法设计和测试的自治堆栈;以及一个开发生态系统,可在感知,状态估计,路径计划和控制中进行可视化和硬件实验。随附的硬件平台是1/6刻度的车辆,并具有可重新配置的用于计算,传感和跟踪的可重新配置的安装。其目的是允许对算法和传感器配置进行物理测试和改进。由于该车辆平台在模拟环境中具有数字双胞胎,因此可以测试和比较模拟和现实中相同的算法和自主堆栈。该平台的构建是为了表征和管理模拟到现实差距。在此,我们描述了如何建立,部署和用于改善移动应用程序的自主权。
translated by 谷歌翻译
电子健康记录(EHR)是现代医疗系统的重要组成部分,影响医疗保健提供,运营和研究。尽管在EHR中进行了结构化信息,但非结构化的文本仍吸引了很多关注,并已成为一个令人兴奋的研究领域。最近的神经自然语言处理(NLP)方法的成功导致了处理非结构化临床笔记的新方向。在这项工作中,我们创建了一个用于临床文本的Python库,Ehrkit。该库包含两个主要部分:模拟III特定功能和任务特定功能。第一部分介绍了用于访问MIMIC-III NoteEvents数据的接口列表,包括基本搜索,信息检索和信息提取。第二部分集成了许多第三方库,用于多达12个删除NLP任务,例如命名实体识别,摘要,机器翻译等。
translated by 谷歌翻译
Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are represented with pure symbols, which hurts the machine's capability to understand the real world. The multi-modalization of knowledge graphs is an inevitable key step towards the realization of human-level machine intelligence. The results of this endeavor are Multi-modal Knowledge Graphs (MMKGs). In this survey on MMKGs constructed by texts and images, we first give definitions of MMKGs, followed with the preliminaries on multi-modal tasks and techniques. We then systematically review the challenges, progresses and opportunities on the construction and application of MMKGs respectively, with detailed analyses of the strength and weakness of different solutions. We finalize this survey with open research problems relevant to MMKGs.
translated by 谷歌翻译
生成摘要中的事实不一致严重限制了抽象对话摘要的实际应用。尽管通过使用预先训练的模型实现了显着进展,但在人类评估期间发现了大量的幻觉含量。预先接受的模型最常见的是微调文本摘要的跨熵损失,这可能不是最佳策略。在这项工作中,我们为带注释数据提供了事实错误的类型,以突出显示错误的类型并远离对事实的二进制了解。我们进一步提出了一种培训策略,通过新颖的对比微调,改善了摘要的事实一致性和整体素质。基于我们的语言信息的错误类型,我们设计了各个目标的不同模块化目标。具体而言,我们利用硬阴性样本具有误差,以减少事实不一致的产生。为了捕获扬声器之间的关键信息,我们还设计了特定于对话的损失。使用人类评估和自动忠实度量指标,我们表明我们的模型在对话摘要,Samsum语料库中大大降低了各种事实错误。此外,我们的模型可以推广到会议概述,AMI语料库,它产生的分数明显高于两个数据集关于单词 - 重叠度量标准的基线。
translated by 谷歌翻译